Source routing with fabric switches in an ethernet fabric network

ABSTRACT

In one embodiment, a system includes a network fabric having a plurality of fabric switches interconnected in the network fabric and a switch controller having logic adapted to configure the network fabric, determine one or more paths through the network fabric between any two hosts connected thereto, and create a source-routing table to store the one or more paths through the network fabric between any two hosts connected thereto. In another embodiment, a method includes receiving or creating a packet using a NIC of a host connected to a network fabric having a plurality of fabric switches interconnected therein, determining a path through the network fabric by consulting a source-routing table stored to the host, storing source-routing information to a packet header for the packet, the source-routing information including the path, and sending the packet to a first device or hop indicated by the path in the source-routing information.

BACKGROUND

The present invention relates to data center infrastructure, and moreparticularly, this invention relates to reducing the overhead associatedwith using look-up tables in fabric switches to reduce latency.

A switching processor, such as a switching application specificintegrated circuit (ASIC), may be used to choose a port to send receivednetwork packets. Typically, a look-up table is utilized to choose whichport to send a received packet based on a destination address designatedin a header of the received packet. However, as fabric networks growlarger, these look-up tables may encompass vast amounts of data, whichcauses latency in using the look-up table to determine an egress port toforward packets to. Accordingly, it would be beneficial to have a methodto reduce the overhead associated with using look-up tables in fabricswitches in order to reduce fabric latency.

SUMMARY

In one embodiment, a system for source routing packets includes anetwork fabric having a plurality of fabric switches interconnected inthe network fabric and a switch controller having logic adapted toconfigure the network fabric, determine one or more paths through thenetwork fabric between any two hosts connected thereto, and create asource-routing table to store the one or more paths through the networkfabric between any two hosts connected thereto.

According to another embodiment, a computer program product for sourcerouting packets includes a computer readable storage medium havingprogram code embodied therewith, the program code readable/executable bya switch controller to: configure a network fabric having a plurality offabric switches interconnected in the network fabric, determine one ormore paths through the network fabric between any two hosts connectedthereto, and create a source-routing table to store the one or morepaths through the network fabric between any two hosts connectedthereto.

In another embodiment, a method for source routing packets includesreceiving or creating a packet using a network interface card (NIC) of ahost connected to a network fabric having a plurality of fabric switchesinterconnected therein, determining a path through the network fabric byconsulting a source-routing table stored to the host, storingsource-routing information to a packet header for the packet, thesource-routing information including the path, and sending the packet toa first device or hop indicated by the path in the source-routinginformation.

In yet another embodiment, a method for source routing packets includesreceiving a packet, receiving source-routing information with a fabricswitch interconnected to other fabric switches in a network fabric, thesource-routing information being sent from a switch controller, storingthe source-routing information to a source-routing table that indicatesa sequence of devices or hops between the fabric switch and each knowndestination address in the network fabric, determining a next device orhop in a path through the network fabric by consulting thesource-routing table, storing a portion of the source-routinginformation to a packet header for the packet, the portion of thesource-routing information including at least a portion of the path, andsending the packet to the next device or hop indicated by the at leastthe portion of the path in the portion of the source-routinginformation.

Other aspects and embodiments of the present invention will becomeapparent from the following detailed description, which, when taken inconjunction with the drawings, illustrate by way of example theprinciples of the invention.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

FIG. 1 illustrates a network architecture, in accordance with oneembodiment.

FIG. 2 shows a representative hardware environment that may beassociated with the servers and/or clients of FIG. 1, in accordance withone embodiment.

FIG. 3 shows a system for source routing packets, according to oneembodiment.

FIG. 4 shows an exemplary path through a network fabric, according toone embodiment.

FIG. 5A shows an exemplary frame format for a packet havingsource-routing information, according to one embodiment.

FIG. 5B is an exemplary tag protocol identifier, according to oneembodiment.

FIG. 6 is a flowchart of a method, according to one embodiment.

FIG. 7 is a flowchart of a method, according to one embodiment.

DETAILED DESCRIPTION

The following description is made for the purpose of illustrating thegeneral principles of the present invention and is not meant to limitthe inventive concepts claimed herein. Further, particular featuresdescribed herein can be used in combination with other describedfeatures in each of the various possible combinations and permutations.

Unless otherwise specifically defined herein, all terms are to be giventheir broadest possible interpretation including meanings implied fromthe specification as well as meanings understood by those skilled in theart and/or as defined in dictionaries, treatises, etc.

It must also be noted that, as used in the specification and theappended claims, the singular forms “a,” “an,” and “the” include pluralreferents unless otherwise specified.

In one general embodiment, a system for source routing packets includesa network fabric having a plurality of fabric switches interconnected inthe network fabric and a switch controller having logic adapted toconfigure the network fabric, determine one or more paths through thenetwork fabric between any two hosts connected thereto, and create asource-routing table to store the one or more paths through the networkfabric between any two hosts connected thereto.

According to another general embodiment, a computer program product forsource routing packets includes a computer readable storage mediumhaving program code embodied therewith, the program codereadable/executable by a switch controller to: configure a networkfabric having a plurality of fabric switches interconnected in thenetwork fabric, determine one or more paths through the network fabricbetween any two hosts connected thereto, and create a source-routingtable to store the one or more paths through the network fabric betweenany two hosts connected thereto.

In another general embodiment, a method for source routing packetsincludes receiving or creating a packet using a network interface card(NIC) of a host connected to a network fabric having a plurality offabric switches interconnected therein, determining a path through thenetwork fabric by consulting a source-routing table stored to the host,storing source-routing information to a packet header for the packet,the source-routing information including the path, and sending thepacket to a first device or hop indicated by the path in thesource-routing information.

In yet another general embodiment, a method for source routing packetsincludes receiving a packet, receiving source-routing information with afabric switch interconnected to other fabric switches in a networkfabric, the source-routing information being sent from a switchcontroller, storing the source-routing information to a source-routingtable that indicates a sequence of devices or hops between the fabricswitch and each known destination address in the network fabric,determining a next device or hop in a path through the network fabric byconsulting the source-routing table, storing a portion of thesource-routing information to a packet header for the packet, theportion of the source-routing information including at least a portionof the path, and sending the packet to the next device or hop indicatedby the at least the portion of the path in the portion of thesource-routing information.

By using a switch controller, such as a controller operating OpenFlowsoftware (an OpenFlow Controller) or a switch controller that operatesaccording to software-defined network (SDN) standards, a plurality ofswitches in a network fabric which are capable of communicating with theswitch controller may be instructed of desirable paths with which toforward received packets in order to best utilize the network fabric. Toaccomplish this, intelligence or functionality may be built into theswitch controller to determine paths through the network fabric and todeliver these desired paths to individual switches in the network fabricthat are compliant with whatever software the switch controllerutilizes. In addition, in one approach, when the switch controlleroperates according to OpenFlow and/or SDN standards, the switches may beOpenFlow and/or SDN compliant in order to utilize the source routingtechniques described herein.

As will be appreciated by one skilled in the art, aspects of the presentinvention may be embodied as a system, method or computer programproduct. Accordingly, aspects of the present invention may take the formof an entirely hardware embodiment, an entirely software embodiment(including firmware, resident software, micro-code, etc.) or anembodiment combining software and hardware aspects that may allgenerally be referred to herein as “logic,” a “circuit,” “module,” or“system.” Furthermore, aspects of the present invention may take theform of a computer program product embodied in one or more computerreadable medium(s) having computer readable program code embodiedthereon.

Any combination of one or more computer readable medium(s) may beutilized. The computer readable medium may be a computer readable signalmedium or a non-transitory computer readable storage medium. Anon-transitory computer readable storage medium may be, for example, butnot limited to, a system, apparatus, device, or any suitable combinationof the foregoing which may rely on any suitable technology types, suchas electronic, magnetic, optical, electromagnetic, infrared,semiconductor, etc. More specific examples (a non-exhaustive list) ofthe non-transitory computer readable storage medium include thefollowing: a portable computer diskette, a hard disk, a random accessmemory (RAM), a read-only memory (ROM), an erasable programmableread-only memory (EPROM or Flash memory), a portable compact discread-only memory (CD-ROM), a Blu-ray disc read-only memory (BD-ROM), anoptical storage device, a magnetic storage device, or any suitablecombination of the foregoing. In the context of this document, anon-transitory computer readable storage medium may be any tangiblemedium that is capable of containing, or storing a program orapplication for use by or in connection with an instruction executionsystem, apparatus, or device.

A computer readable signal medium may include a propagated data signalwith computer readable program code embodied therein, for example, inbaseband or as part of a carrier wave. Such a propagated signal may takeany of a variety of forms, including, but not limited to,electro-magnetic, optical, or any suitable combination thereof. Acomputer readable signal medium may be any computer readable medium thatis not a non-transitory computer readable storage medium and that cancommunicate, propagate, or transport a program for use by or inconnection with an instruction execution system, apparatus, or device,such as an electrical connection having one or more wires, an opticalfibre, etc.

Program code embodied on a computer readable medium may be transmittedusing any appropriate medium, including but not limited to wireless,wireline, optical fibre cable, RF, etc., or any suitable combination ofthe foregoing.

Computer program code for carrying out operations for aspects of thepresent invention may be written in any combination of one or moreprogramming languages, including an object oriented programming languagesuch as Java, Smalltalk, C++, or the like, and conventional proceduralprogramming languages, such as the “C” programming language or similarprogramming languages. The program code may execute entirely on a user'scomputer, partly on the user's computer, as a stand-alone softwarepackage, partly on the user's computer and partly on a remote computeror entirely on the remote computer or server. In the latter scenario,the remote computer or server may be connected to the user's computerthrough any type of network, including a local area network (LAN),storage area network (SAN), and/or a wide area network (WAN), or theconnection may be made to an external computer, for example through theInternet using an Internet Service Provider (ISP).

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatuses(systems), and computer program products according to variousembodiments of the invention. It will be understood that each block ofthe flowchart illustrations and/or block diagrams, and combinations ofblocks in the flowchart illustrations and/or block diagrams, may beimplemented by computer program instructions. These computer programinstructions may be provided to a processor of a general purposecomputer, special purpose computer, or other programmable dataprocessing apparatus to produce a machine, such that the instructions,which execute via the processor of the computer or other programmabledata processing apparatus, create means for implementing thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

These computer program instructions may also be stored in a computerreadable medium that may direct a computer, other programmable dataprocessing apparatus, or other devices to function in a particularmanner, such that the instructions stored in the computer readablemedium produce an article of manufacture including instructions whichimplement the function/act specified in the flowchart and/or blockdiagram block or blocks.

The computer program instructions may also be loaded onto a computer,other programmable data processing apparatus, or other devices to causea series of operational steps to be performed on the computer, otherprogrammable apparatus or other devices to produce a computerimplemented process such that the instructions which execute on thecomputer or other programmable apparatus provide processes forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

FIG. 1 illustrates a network architecture 100, in accordance with oneembodiment. As shown in FIG. 1, a plurality of remote networks 102 areprovided including a first remote network 104 and a second remotenetwork 106. A gateway 101 may be coupled between the remote networks102 and a proximate network 108. In the context of the present networkarchitecture 100, the networks 104, 106 may each take any formincluding, but not limited to a LAN, a WAN such as the Internet, publicswitched telephone network (PSTN), internal telephone network, etc.

In use, the gateway 101 serves as an entrance point from the remotenetworks 102 to the proximate network 108. As such, the gateway 101 mayfunction as a router, which is capable of directing a given packet ofdata that arrives at the gateway 101, and a switch, which furnishes theactual path in and out of the gateway 101 for a given packet.

Further included is at least one data server 114 coupled to theproximate network 108, and which is accessible from the remote networks102 via the gateway 101. It should be noted that the data server(s) 114may include any type of computing device/groupware. Coupled to each dataserver 114 is a plurality of user devices 116. Such user devices 116 mayinclude a desktop computer, laptop computer, handheld computer, printer,and/or any other type of logic-containing device. It should be notedthat a user device 111 may also be directly coupled to any of thenetworks, in some embodiments.

A peripheral 120 or series of peripherals 120, e.g., facsimile machines,printers, scanners, hard disk drives, networked and/or local storageunits or systems, etc., may be coupled to one or more of the networks104, 106, 108. It should be noted that databases and/or additionalcomponents may be utilized with, or integrated into, any type of networkelement coupled to the networks 104, 106, 108. In the context of thepresent description, a network element may refer to any component of anetwork.

According to some approaches, methods and systems described herein maybe implemented with and/or on virtual systems and/or systems whichemulate one or more other systems, such as a UNIX system which emulatesan IBM z/OS environment, a UNIX system which virtually hosts a MICROSOFTWINDOWS environment, a MICROSOFT WINDOWS system which emulates an IBMz/OS environment, etc. This virtualization and/or emulation may beenhanced through the use of VMWARE software, in some embodiments.

In more approaches, one or more networks 104, 106, 108, may represent acluster of systems commonly referred to as a “cloud.” In cloudcomputing, shared resources, such as processing power, peripherals,software, data, servers, etc., are provided to any system in the cloudin an on-demand relationship, thereby allowing access and distributionof services across many computing systems. Cloud computing typicallyinvolves an Internet connection between the systems operating in thecloud, but other techniques of connecting the systems may also be used,as known in the art.

FIG. 2 shows a representative hardware environment associated with auser device 116 and/or server 114 of FIG. 1, in accordance with oneembodiment. FIG. 2 illustrates a typical hardware configuration of aworkstation having a central processing unit (CPU) 210, such as amicroprocessor, and a number of other units interconnected via one ormore buses 212 which may be of different types, such as a local bus, aparallel bus, a serial bus, etc., according to several embodiments.

The workstation shown in FIG. 2 includes a Random Access Memory (RAM)214, Read Only Memory (ROM) 216, an I/O adapter 218 for connectingperipheral devices such as disk storage units 220 to the one or morebuses 212, a user interface adapter 222 for connecting a keyboard 224, amouse 226, a speaker 228, a microphone 232, and/or other user interfacedevices such as a touch screen, a digital camera (not shown), etc., tothe one or more buses 212, communication adapter 234 for connecting theworkstation to a communication network 235 (e.g., a data processingnetwork) and a display adapter 236 for connecting the one or more buses212 to a display device 238.

The workstation may have resident thereon an operating system such asthe MICROSOFT WINDOWS Operating System (OS), a MAC OS, a UNIX OS, etc.It will be appreciated that a preferred embodiment may also beimplemented on platforms and operating systems other than thosementioned. A preferred embodiment may be written using JAVA, XML, C,and/or C++ language, or other programming languages, along with anobject oriented programming methodology. Object oriented programming(OOP), which has become increasingly used to develop complexapplications, may be used.

Now referring to FIG. 3, a system 300 is shown according to oneembodiment, which has a plurality of fabric switches 304 interconnectedin a network fabric 302, each of the fabric switches 304 being connectedto one another via connections 306. Each fabric switch 304 is connected,directly or indirectly to a switch controller 308 (as denoted by dashedline connection 310 between the switch controller 308 and the networkfabric 302). The switch controller 308 is capable of receivinginformation from each of the fabric switches 304 and is capable ofsending information and/or commands to the fabric switches 304.

According to one embodiment, the switch controller 308 may operateaccording to OpenFlow and/or SDN standards, and each fabric switch 304may be OpenFlow and/or SDN compliant. In other embodiments, the switchcontroller 308 may utilize a different application capable ofcontrolling the fabric switches 304 as would be known by one of skill inthe art, such as Beacon, Jaxon, NOX, POX, Maestro, etc.

In addition, the network fabric 302 may be a physical and/or virtualnetwork fabric (a network fabric which utilizes only physical devices, anetwork fabric which only utilizes virtual devices, and/or a networkfabric which utilizes a combination of physical and virtual devices). Inaddition, each of the fabric switches 304 may be a physical switch, avirtual switch, or a combination thereof.

The system 300 may further comprise one or more hosts 312 connected tothe network fabric 302 via one or more fabric switches 304 viaconnections 314. Any of the hosts 312 may be a physical host, a virtualhost, or a combination thereof. The hosts may be any type of devicecapable of communicating with the network fabric 302, such as anothernetwork, a server, a controller, a workstation, an end station, etc.Each host 312 may include an interface for communicating with thenetwork fabric 302 and one or more fabric switches 304 therein. Each ofthe hosts 312 are unaware of the physical components of the networkfabric 302 and instead view the network fabric 302 as a single entity towhich a connection may be made, in one approach. Of course, each host312 is actually connected to at least one physical fabric switch 304within the network fabric 302. The host 312 may be connected to multiplefabric switches 304 in the case of a Link Aggregation (LAG) connection.

The switch controller 308 may comprise logic adapted to analyze andconfigure the network fabric 302 such that there is one or morenon-looping paths through the network fabric 302 between any two hosts312 or other end stations connected to the network fabric 302. Ideally,the logic may be able to determine multiple paths through the networkfabric 302, in order to provide redundancy, increased throughput, anddecreased latency, among other advantages.

There are many factors to consider in determining paths through thenetwork fabric 302. Some factors include the number of layers in thefabric, L, the number of nodes per layer, N_(L), the switch controller'stopology and connectivity graph (and whether the switch controller 308is capable of globalizing the routing decisions), etc.

Furthermore, in order for multipathing to take place in the networkfabric 302, the multipathing may take place in-order via Equal CostMulti-Pathing (ECMP) and/or LAG hashing (and what type of hash used maybe a consideration, such as an industry standard, a legacy system,etc.). In addition, the multipathing may support high performanceoperation via adaptive routing.

Converged Enhanced Ethernet (CEE) may also be supported by the networkfabric 302, such as by using Priority Flow Control (PFC) and/or EnhancedTransmission Selection (ETS) along the complete path through the networkfabric 302 in addition to Quantized Congestion Notification (QCN).Additionally, link congestion may trigger saturation tree with QCN.

In one embodiment, interface-based path representation, where a singleinterface to a network may be used to gain perspective on the networkfrom a point of view of that interface. This interface-based pathrepresentation may then be used to span the network fabric 302, as shownin FIG. 3. For example, Host 1 is shown connected directly to fabricswitch S1. In this example, the interface for Host 1 to the networkfabric 302 may be a single physical port, a virtual port, a static LAG,a dynamic LAG, or any other suitable interface between Host 1 and fabricswitch S1. Also, in this example, a global forwarding table may becreated, managed, updated, and utilized by the switch controller 308 tomake routing decisions, for example, once a packet is received by fabricswitch S1 from Host 1 all the way until the packet is received by host 2via S3.

In one embodiment, the switch controller 308 may be consulted anytime arouting decision is to be made for a packet received by any of thefabric switches 304 in the network fabric 302.

In another embodiment, each fabric switch 304 may have resident thereina source-routing table. In this case, the fabric switch 304 inserts theroute information into each incoming packet that does not yet havesource-routing information stored therein. One disadvantage of thisapproach is that a lot of redundancy in terms of routing information inthe network is introduced, which makes routing updates cumbersome, sincethey must be done for each fabric switch 304 in the network fabric 302.One advantage of this approach is that legacy (i.e., non-source routingcapable) devices and components (e.g., network interface cards (NICs),legacy switches, etc.) may be attached to the network fabric 302.

Now referring to FIG. 4, a portion 400 of the network fabric 302 isshown, with one exemplary path through the network fabric 302 shown inmore detail. This path is between two hosts 312, specifically Host 1 toHost 2, and includes three fabric switches 304—S1, S2, and S3. For thesake of this description, Host 1 may be assumed to have an address ofAddr1 and Host 2 an address of Addr2, while it may be assumed thatfabric switch S1 is connected to Host 1 via port 4 (denoted as P4), andto fabric switch S2 via a LAG (denoted as L2). Likewise, fabric switchS2 is connected to fabric switch S1 via LAG L2. Furthermore, it may beassumed that fabric switch S2 is connected to fabric switch S3 via port3 (denoted as P3), while fabric switch S3 is connected to fabric switchS2 via port 6 (denoted as P6) and to Host 2 via port 2 (denoted as P2),as shown in FIG. 4.

The path between Host 1 and Host 2 may be represented in each fabricswitch 304 in a forwarding table, according to one embodiment, which maybe stored locally to each fabric switch 304, or globally by the switchcontroller 308 in another embodiment.

In this example, the path would be represented as follows in theforwarding table for each fabric switch, where the destination port setis identified as [device]/[port]. Furthermore, each destination port maybe a physical port, a virtual port, or a combination thereof.

Forwarding Table S1 Destination Address Destination Port Set Host Addr1S1/P4 Host Addr2 S1/L2/P5, S2/P3, S3/P2

Forwarding Table S2 Destination Address Destination Port Set Host Addr1S2/L2/P4, S1/P4 Host Addr2 S2/P3, S3/P2

Forwarding Table S3 Destination Address Destination Port Set Host Addr1S3/P6, S2/L2/P4, S1/P4 Host Addr2 S3/P2

Therefore, when a packet is received by fabric switch S1 from Host 1,and the packet is to be forwarded to Host 2, fabric switch S1 willfollow a path from S1/L2/P5 to S2/P3 to S3/P2. This is because the LAGL2 is chosen and follows port 5 out of fabric switch 1 to fabric switch2. Likewise, in the reverse direction, fabric switch 2 chooses port 4 inthe LAG L2. Furthermore, this forwarding logic may take into account ahashing algorithm information exchange protocol, and in one approach,only the edge switches (switches S1 and S3 in this example) may maintainforwarding tables, and intermediate switches (switch S2 in this example)may simply follow the source route in the packet.

In another embodiment, referring again to FIG. 3, the switch controller308 may have certain physical topology available to construct the pathsthrough the network fabric 302. In that topology, the fabric switches304 and physical connectivity therebetween are shown. In using thetopology information, the switch controller 308 is adapted to determinethe ARP entries associated with either its local ARP connectivity orgeneral subnet distribution in the network fabric 302. So combining thisinformation, the switch controller 308 creates these source-routingtables and may offload them to any devices capable of source routing inthe network fabric 302 or connected thereto.

The switch controller 308 may offload the forwarding table informationto just the fabric switches 304, or to the fabric switches 304 and theend hosts 312. In this embodiment, each end host 312 may have aforwarding table that includes source-routing information for packetsbeing sent to other end hosts connected to the network fabric 302.

In this approach, a network interface card (NIC) of the host 312 or someother component or device may produce the source routing for eachforwarded packet. In this approach, each NIC has a source-routing tablethat indicates the sequence of turns or hops to each known destination,and inserts the route into a frame of the packet upon injection into thefabric network 302. Still, hardware support from the fabric switches 304is beneficial in order for this approach to function properly, becausewhen each fabric switch 304 has the capability to inspect the frame forthe presence of a source route the various fabric switches 304 in thepath may then take its respective routing decision based on thatsource-routed information. Otherwise, when a fabric switch 304 whichlacks source-routing capability encounters a packet, it will only beable to send the packet along according to some other information,without the benefit of the source-routing information which indicates achosen route through the network. In addition, each fabric switch 304may still have a traditional routing table to handle non-source-routedframes, regardless of whether the fabric switch 304 has the capabilityto handle source-routed frames.

In the case where a switch lacks the ability to handle source-routedframes, the fabric switch 304 may simply rely on a traditional routingtable with which to determine a next hop and egress port. In this case,one or more devices within the path may lack the ability to handle thesource-routed frames, but the packet may still be forwarded withoutproblems until it reaches another fabric switch 304 or device in thepath which is capable of handling a source-routed frame, where it willonce again be handled according to the source routing. Each device inthe network fabric may be a virtual device, a physical device, or acombination thereof. Furthermore, each egress port may be a physicalport, a virtual port, or a combination thereof.

In any embodiment described herein, each device in the network fabric302 and connected thereto capable of source routing may know each of theother devices to which it is connected which are source-routing capable.Accordingly, each source routing capable device is able to determinewhen it is forwarding a packet to a device which is not capable ofsource routing. In this case, the source-routing information may bestripped from the packet, and may appear as a standard packet to thereceiving device.

Referring again to FIG. 4, the exemplary path through the network fabric302 is again referenced. This path is between two hosts 312,specifically Host 1 to Host 2, and includes three fabric switches304—S1, S2, and S3. In this approach, however, Host 1 and Host 2 havethe forwarding tables, and the switches are instructed to forwardpackets according to the source-routed information included therein.

In this approach, the path would be represented as follows in theforwarding table for each host 312, where the destination port set isidentified as [device]/[port].

Forwarding Table Host 1 Destination Address Destination Port Set HostAddr1 Internal Host Addr2 S1/L2/P5, S2/P3, S1/P2

Forwarding Table Host 2 Destination Address Destination Port Set HostAddr1 S3/P6, S2/L2/P4, S1/P4 Host Addr2 Internal

The source-routing information may be encapsulated in the packets invarious different ways, and may depend on any protocols and/or networktypes that the packet adheres to. In one embodiment, the source-routinginformation may be included in a header of a packet. One example of thisis shown in FIG. 5A.

Referring to FIG. 5A, a frame format 500 for an exemplary packet isshown according to one embodiment. The frame format 500 includes adestination media access control address (DMAC) 502, a source MACaddress (SMAC) 504, a source routing tag (SR-Tag) 506 that includes thesource-routing information, a service tag (S-Tag) 508, a customer tag(C-Tag) 510, an ethertype descriptor 512, a payload 514 for the packet,and an optional frame check sequence (FCS) 516.

In one approach, the DMAC 502, SMAC 504, S-Tag 508, C-Tag 510, payload514 and FCS 516 may behave and be utilized in the same manner as typicalfor any packet adhering to any of various IEEE standards; however, theether type descriptor 512 may take into account the length of the SR-Tag506.

Regarding the SR-Tag 506, when a switch is not source routing capable,then the SR-Tag 506 may be omitted from the packet in order for theswitch to understand the information in the header of the packet. Thismay be performed by any device which forwards the packet to a non-sourcerouting compliant device, such as a legacy switch. Then when the packetis received from this legacy device by another source routing compliantdevice, the SR-Tag 506 may be reinserted into the header and thesource-routing information may be restored from this hop forward to thedestination in one embodiment, or the entire source-routing informationmay be added to the SR-Tag 506 from the source to the destination, in analternate embodiment.

The SR-Tag 506, in some embodiments, may include source-routinginformation, enforcement options, and hop count information.

A Tag Protocol Identifier (TPID), such as the TPID 520 shown in FIG. 5Baccording to one embodiment, may be used to denote the SR-Tag 506. TheTPID 520 to denote a SR-Tag may have the code 0×D2D2, but is not soconstrained, as any available string may be used to denote an SR-Tag.The SR-Tag 506 may be formatted to include a series of strings, eachstring having a predetermined length. In this example, the strings are16 bits long, but any length may be used, such as 8 bits, 24 bits, 32bits, etc. The first string may be designated for Enforcement Options522 and the Hop Count 524, with each string including half the stringlength (8 bits each) or some other division.

The Enforcement Options 522 may be used to indicate any enforcementcriteria for a particular packet. For example, if the switch has aforwarding table which is inconsistent with a next hop stipulated in thesource-routing information, then the switch may be directed to overwriteits own forwarding table with the source-routing information or thesource-routing information may be rewritten based on the switch's localforwarding table. This decision may be indicated in the EnforcementOptions 522. These Enforcement Options 522 also may dictate whether thesource-routing information is strictly followed or if it may bebypassed. Then, other traffic management options may be present, such aswhich of various available ports to choose to egress the packet (such asin a LAG or some other suitable arrangement). This is possible becauseit is a logical interface. A logical interface might come up with morethan one physical port to choose from. The Enforcement Options 522 mayindicate that a port with lowest latency should be chosen, or a portwith the highest latency but more reliability, or some other trafficmanagement decision that are understood by intermediate switches. Mostof the instructions that may be stored in the Enforcement Options 522may be related to reliability and/or traffic management. Some of theseoptions may even allow filtering or not filtering based on the SR-Tagsbecause the SR-Tags are a necessary component in order to filter, andbasically the options might indicate that the SR-Tag is to be retainedno matter what. In another approach, the switch may use the SR-Tag if itunderstands it, or it may discard the SR-Tag if it does not.

The Hop Count 524 is used to denote which hop the packet is currentlyat. After the initial string having the Enforcement Options 522 and theHop Count 524, a series of Bridge IDs 526 related to Logical Port IDs528 and Options 530 thereof may be listed, one for each hop, shown ashop 0, hop 1, hop 2, . . . , hop N. The number of Bridge IDs 526 andLogical Port IDs 528 and Options 530 may depend on the number of hops inthe designated path, e.g., N. The Hop Count 524 indicates the currenthop in the path where the packet is supposed to be, e.g., a numberbetween 0 and N, either beginning at N or 0. If the packet is not at theindicated hop, then corrective action may be taken by the switch tocorrect any issues with the Hop Count 524 and/or designated path.

Each set of Bridge ID 526 and Logical Port ID 528 and Options thereofmay be 16 bits in length, with the Bridge IDs 526 being 16 bits, theLogical Port ID 528 being 12 bits, and the Options 530 being 4 bits. Ofcourse, any other length may be used for these fields, as would beunderstood by one of skill in the art.

Now referring to FIG. 6, a flowchart of a method 600 for source routingpackets is shown, according to one embodiment. The method 600 may beperformed in accordance with the present invention in any of theenvironments depicted in FIGS. 1-5B, among others, in variousembodiments. Of course, more or less operations than those specificallydescribed in FIG. 6 may be included in method 600, as would beunderstood by one of skill in the art upon reading the presentdescriptions.

Each of the steps of the method 600 may be performed by any suitablecomponent of the operating environment. For example, in one embodiment,the method 600 may be partially or entirely performed by a fabricswitch, an end station, a processor (such as an ASIC, a switching ASIC,a CPU, etc.) embodied in a computer, a switch controller, a hostconnected to a network fabric having a plurality of fabric switchesinterconnected therein, etc.

As shown in FIG. 6, method 600 may initiate with operation 602, where apacket is received or created using a network interface card (NIC) of ahost connected to a network fabric. The NIC in this method is sourcerouting capable. The network fabric includes a plurality of fabricswitches interconnected therein, each fabric switch possibly beingsource routing capable.

According to one embodiment, the switch controller may be adapted tooperate according to OpenFlow standards, and the NIC or host may beOpenFlow compliant. In this approach, source-routing table details andrules may be received from the OpenFlow Controller, as a way ofprogramming which information and how the information is stored in thesource-routing table.

In operation 604, a path through the network fabric is determined byconsulting a source-routing table stored to the host. The path may bechosen from many different available paths between the host and thedetonation address of the packet. In one embodiment, traffic may be loadbalanced between the destination address and the host by changing whichpath is selected for each new packet, stream of packets, flow, etc.

In a further embodiment, source-routing information may be received froma switch controller, and the source-routing information may be stored tothe source-routing table, thereby allowing the host to send packets toany known destination in the network fabric without the use of atraditional look-up table.

In operation 606, source-routing information is stored to a packetheader for the packet, the source-routing information comprising thepath.

In a further embodiment, the source-routing information may be stored ina SR-Tag in the packet header. The SR-Tag may comprise, as described inmore detail previously, an enforcement options field, a hop countindicator field for indicating a current device or hop in the path, andthe source-routing information for the path, comprising a bridgeindicator associated with a logical port indicator and options thereoffor each device or hop in the path.

In operation 608, the packet is sent to a first device or hop indicatedby the path in the source-routing information. This operation may beperformed without the use of a look-up table. The first device or hopmay be part of the path stored in the packet header which indicates thepath through the network fabric.

In more embodiments, referring again to FIG. 6, any or all operations ofmethod 600 may be implemented in a system, a fabric switch, a device, anetwork, a host, a processor, and/or a computer program product.

Now referring to FIG. 7, another flowchart of method 700 for sourcerouting packets is shown, according to one embodiment. The method 700may be performed in accordance with the present invention in any of theenvironments depicted in FIGS. 1-5B, among others, in variousembodiments. Of course, more or less operations than those specificallydescribed in FIG. 7 may be included in method 700, as would beunderstood by one of skill in the art upon reading the presentdescriptions.

Each of the steps of the method 700 may be performed by any suitablecomponent of the operating environment. For example, in one embodiment,the method 700 may be partially or entirely performed by a fabricswitch, an end station, a processor (such as an ASIC, a switching ASIC,a CPU, etc.) embodied in a computer, a switch controller, a hostconnected to a network fabric having a plurality of fabric switchesinterconnected therein, etc.

As shown in FIG. 7, method 700 may initiate with operation 702, where apacket is received, such as by a fabric switch in a network fabriccomprising a plurality of interconnected fabric switches. In addition,one or more hosts may be connected to the network fabric. The fabricswitch may be connected, directly or indirectly, to a switch controllerfor controlling certain functions thereof.

According to one embodiment, the switch controller may be adapted tooperate according to OpenFlow standards, and the fabric switch may beOpenFlow compliant.

In operation 704, source-routing information may be received with afabric switch, the source-routing information being sent from the switchcontroller.

In operation 706, the source-routing information is stored to asource-routing table that indicates a sequence of devices or hopsbetween the fabric switch and each known destination address in thenetwork fabric.

In operation 708, a next device or hop in a path through the networkfabric is determined by consulting the source-routing table. Thisoperation may be performed without the use of a look-up table, in oneapproach.

In operation 710, a portion of the source-routing information is storedto a packet header for the packet, the portion of the source-routinginformation comprising at least a portion of the path.

In one embodiment, the portion of the source-routing information may bestored in a SR-Tag, the SR-Tag comprising an enforcement options field,a hop count indicator field for indicating a current device or hop inthe path, and the portion of the source-routing information for the atleast the portion of the path. The portion of the source-routinginformation comprising a bridge indicator associated with a logical portindicator and options thereof for each device or hop in the path.

In operation 712, the packet is sent to the next device or hop indicatedby the at least the portion of the path in the portion of thesource-routing information.

In more embodiments, referring again to FIG. 7, any or all operations ofmethod 700 may be implemented in a system, a fabric switch, a device, anetwork, a host, a processor, and/or a computer program product.

While various embodiments have been described above, it should beunderstood that they have been presented by way of example only, and notlimitation. Thus, the breadth and scope of an embodiment of the presentinvention should not be limited by any of the above-described exemplaryembodiments, but should be defined only in accordance with the followingclaims and their equivalents.

What is claimed is:
 1. A system for source routing packets, the systemcomprising: a network fabric comprising a plurality of fabric switchesinterconnected in the network fabric; and a switch controller,comprising logic configured to: configure the network fabric; determineone or more paths through the network fabric between any two hostsconnected thereto; create a source-routing table to store the one ormore paths through the network fabric between any two hosts connectedthereto; store the one or more paths through the network fabric betweenany two hosts connected thereto to the source-routing table; and store,to a first host, at least one path through the network fabricoriginating from the first host in response to a determination that thefirst host comprises a forwarding table that includes source-routinginformation for packets being sent to other end hosts connected to thenetwork fabric, wherein the at least one path comprises: a destinationaddress corresponding to the first host connected to the network fabric;a destination address corresponding to a second host connected to thenetwork fabric; and a destination port set representing each device orhop between the first host and the second host, each destination portcomprising a device identifier and an egress port identifier for thedevice or hop in the path.
 2. The system as recited in claim 1, whereinthe switch controller is configured to operate according tosoftware-defined network (SDN) and/or OpenFlow standards, wherein atleast one of the plurality of fabric switches is SDN and/or OpenFlowcompliant, wherein the network fabric is a physical and/or virtualnetwork fabric, wherein the fabric switches are physical and/or virtualfabric switches, and wherein the hosts are physical and/or virtualhosts.
 3. The system as recited in claim 1, wherein the logic is furtherconfigured to determine address resolution protocol (ARP) entriesassociated with either local ARP connectivity for the switch controlleror general subnet distribution in the network fabric, wherein thedestination port and the egress port are physical and/or virtual ports,and wherein the device is a physical and/or virtual device.
 4. Thesystem as recited in claim 3, wherein destination ports included in eachdestination port set are ordered successively from the first host to thesecond host, or vice versa.
 5. The system as recited in claim 1, whereinthe switch controller further comprises logic configured to edit aforwarding table stored to each fabric switch capable of source routingthat is capable of communicating with the switch controller withsource-routing information for each path utilizing the fabric switch. 6.The system as recited in claim 1, wherein the switch controller furthercomprises logic configured to edit a forwarding table stored to eachhost capable of source routing connected to the network fabric that iscapable of communicating with the switch controller with source-routinginformation for each path which includes the host capable of sourcerouting.
 7. The system as recited in claim 1, wherein each host capableof source routing further comprises logic configured to: receivesource-routing information from the switch controller; edit asource-routing table that indicates a sequence of devices or hopsbetween the host capable of source routing and each known destinationaddress in the network fabric; and insert source-routing informationinto any egress packets.
 8. The system as recited in claim 7, whereinthe logic is executed by a network interface card (NIC) of each hostcapable of source routing.
 9. The system as recited in claim 7, whereinthe logic configured to insert the source-routing information into anyegress packets comprises logic configured to: edit a packet header foreach of the egress packets to include a source-routing tag (SR-Tag), theSR-Tag comprising source-routing information, enforcement options, andhop count information; and send the egress packets according to a firstdevice or hop indicated by the source-routing information in the SR-Tag.10. The system as recited in claim 9, wherein the SR-Tag farthercomprises: an enforcement options field; a hop count indicator field forindicating a current device or hop in a path; and a set ofsource-routing information for the path, comprising abridge indicatorassociated with a logical port indicator and options thereof for eachdevice or hop in the path.
 11. A system for source routing packets, thesystem comprising: a network fabric comprising a plurality of fabricswitches interconnected in the network fabric; and a switch controller,comprising logic configured to: configure the network fabric; determineone or more paths through the network fabric between any two hostsconnected thereto; and create a source-routing table to store the one ormore paths through the network fabric between any two hosts connectedthereto, wherein each fabric switch capable of source routing in thenetwork fabric comprises logic configured to: receive source-routinginformation from the switch controller; edit a source-routing table thatindicates a sequence of devices or hops between the fabric switchcapable of source routing and each known destination address in thenetwork fabric; receive a packet having no source-routing informationincluded therein; edit a packet header for the received packet toinclude source-routing information, enforcement options, and hop countinformation in a source routing tag (SR-Tag); and forward the receivedpacket according to a next hop indicated by the source-routinginformation in the SR-Tag.
 12. The system as recited in claim 11,wherein the SR-Tag further comprises: an enforcement options field; ahop count indicator field for indicating a current device or hop in apath; and a set of source-routing information for the path, comprising abridge indicator associated with a logical port indicator and optionsthereof for each device or hop in the path.
 13. A computer programproduct for source routing packets, the computer program productcomprising a computer readable storage device having program codeembodied therewith, the program code readable/executable by a switchcontroller to: configure a network fabric comprising a plurality offabric switches interconnected in the network fabric; determine one ormore paths through the network fabric between any two hosts connectedthereto; create a source-routing table to store the one or more pathsthrough the network fabric between any two hosts connected thereto;store the one or more paths through the network fabric between any twohosts connected thereto to the source-routing table; and store, to afirst host, at least one path through the network fabric originatingfrom the first host in response to a determination that the first hostcomprises a forwarding table that includes source-routing informationfor packets being sent to other end hosts connected to the networkfabric.
 14. The computer program product as recited in claim 13, whereinthe switch controller is configured to operate according to OpenFlowstandards, and wherein one or more of the fabric switches are OpenFlowcompliant, wherein the routing table comprises each of the one or morepaths through the network fabric between any two hosts connectedthereto, each of the one or more paths comprising: a first destinationaddress corresponding to a first host accessible via the network fabric;a second destination address corresponding to a second host accessiblevia the network fabric; and a destination port set representing eachdevice or hop between the first host and the second host, eachdestination port comprising a device identifier and an egress portidentifier for the device or hop in the path, wherein destination portsincluded in each destination port set are ordered successively from thefirst host to the second host, or vice versa.
 15. The computer programproduct as recited in claim 13 wherein the program codereadable/executable by the switch controller is further configured toedit a forwarding table stored to each fabric switch capable of sourcerouting that is capable of communicating with the switch controller withsource-routing information for each path utilizing the fabric switch.16. The computer program product as recited in claim 13, wherein theprogram code readable/executable by the switch controller is furtherconfigured to edit a forwarding table stored to each host capable ofsource routing connected to the network fabric that is capable ofcommunicating with the switch controller with source-routing informationfor each path which includes the host capable of source routing.
 17. Amethod for source routing packets, the method comprising: receiving orcreating a packet using a network interface card (NIC) of a hostconnected to a network fabric comprising a plurality of fabric switchesinterconnected therein; determining a path through the network fabric byconsulting a source-routing table stored to the host; storingsource-routing information to a packet header for the packet, thesource-routing information comprising the path; and sending the packetto a first device or hop indicated by the path in the source-routinginformation.
 18. The method as recited in claim 17, further comprising:receiving source-routing information from a switch controller; andstoring the source-routing information to the source-routing table. 19.The method as recited in claim 18, wherein the switch controller and theNIC are configured to operate according to OpenFlow standards, and themethod further comprises receiving source-routing table details andrules from the switch controller via OpenFlow.
 20. The method as recitedin claim 17, wherein the storing the source-routing information to thepacket header comprises storing the source-routing information in asource routing tag (SR-Tag), the SR-Tag comprising: an enforcementoptions field; a hop count indicator field for indicating a currentdevice or hop in the path; and the source-routing information for thepath, comprising a bridge indicator associated with a logical portindicator and options thereof for each device or hop in the path.